Learning to Optimize via Information-Directed Sampling
نویسندگان
چکیده
منابع مشابه
Learning to Optimize via Information-Directed Sampling
This paper proposes information directed sampling–a new algorithm for balancing between exploration and exploitation in online optimization problems in which a decision-maker must learn from partial feedback. The algorithm quantifies the amount learned by selecting an action through an information theoretic measure: the mutual information between the true optimal action and the algorithm’s next...
متن کاملLearning to Optimize via Posterior Sampling
Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval. For more information, contact [email protected]. The Publisher does not warr...
متن کاملInformation Directed reinforcement learning
Efficient exploration is recognized as a key difficulty in reinforcement learning. We consider an episodic undiscounted MDP where the goal is to minimize the sum of regrets over different episodes. Classical methods are either based on optimism in the face of uncertainty or on probability matching. In this project we explore an approach that aims at quantifying the cost of exploration while rem...
متن کاملA Note on Information-Directed Sampling and Thompson Sampling
This note introduce three Bayesian style Multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling and Generalized Thompson Sampling. The goal is to give an intuitive explanation for these three algorithms and their regret bounds, and provide some derivations that are omitted in the original papers.
متن کاملLearning to Optimize Plan Execution in Information Agents
We can build software agents to perform a wide variety of useful information gathering and monitoring tasks on the Web [1]. For example, in the travel domain, we can construct agents to notify you of flight delays in real time, monitor for schedule and price changes, and even send a fax to a hotel if your flight is delayed to ensure that your hotel room will not be given away [2,3]. To perform ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Operations Research
سال: 2018
ISSN: 0030-364X,1526-5463
DOI: 10.1287/opre.2017.1663